LSTM input timestep optimization using simulated annealing for wind power predictions

Wind energy is one of the renewable energy sources like solar energy, and accurate wind power prediction can help countries deploy wind farms at particular locations yielding more electricity. For any prediction problem, determining the optimal time step (lookback) information is of primary importance, and using information from previous timesteps can improve the prediction scores. This article uses simulated annealing to find an optimal time step for wind power prediction. Finding an optimal timestep is computationally expensive and may require brute-forcing to evaluate the deep learning model at each time. This article uses simulated annealing to find an optimal time step for wind power prediction. The computation time was reduced from 166 hours to 3 hours to find an optimal time step for wind power prediction with a simulated annealing-based approach. We tested the proposed approach on three different wind farms with a training set of 50%, a validation set of 25%, and a test set of 25%, yielding MSE of 0.0059, 0.0074, and 0.010 for each wind farm. The article presents the results in detail, not just the mean square root error.


Introduction
Countries are moving towards renewable energy sources [1][2][3] due to the recent increase in global warming, and sources like solar energy and wind energy can play a crucial role in reducing carbon dioxide emissions in the environment. Wind power is one energy source that can help generate free electricity without pollutants. Wind power depends on the location where windmills are installed and on wind speed and direction, and in these articles [4][5][6][7], researchers illustrated the features which can be used for wind power prediction.
This article focuses on improving the performance of a deep learning algorithm for wind power prediction and reducing computational time. Previous timesteps are used to predict the current timestep for any prediction problem [8,9]. For instance, information from timestep 1,2 and 3 can be combined with timestep 4 to predict the wind power for timestep 4, but finding the optimal lookback is difficult because we do not know how much past information should be included to make the current prediction. There are many ways to find that information. For instance, we can use the brute-force technique from lookback = 1 to 500 and run LSTM for each lookback. There are several issues associated with this technique. Firstly, as we increase the lookback variable, the data size grows exponentially, resulting in an exponential increase in training time. Secondly, we do not know how long we should train the LSTM model to obtain optimal results [10][11][12]. Thirdly, the optimal time step for the training data may not be optimal for the test set. These issues raise concerns for a better approach in which we can find an optimal lookback or how many previous timestep information should be used to predict the current timestep, and for how long we should train the machine learning model to yield an optimal performance (formally known as a number of epochs).
In this paper, we used a simulated annealing-based [13] + LSTM approach for wind power prediction, allowing us to find an optimal lookback in a limited number of epochs resulting in reduced training time.
The paper focuses on wind power prediction, but it is important to notice that it can also be used for other predictions. The previous timestep inclusion to predict the current timestep can significantly improve the performance. Optimization algorithms like particle swarm optimization [14], genetic algorithm, and hill-climbing can be used to find an optimal lookback. We used simulated annealing, which improves the current results generated using a genetic algorithm.
The important point to notice here is that the contribution of this article is not just using the simulated annealing with LSTM for wind power prediction but also how we used it to find an optimal look back, and it involves running some other computational steps which are documented in the methodology.
In these papers [15][16][17], researchers compared wind power prediction based on physical, statistical, and hybrid methods over different time scales.
Following are the scientific contribution of this research work.
• We integrated simulated annealing (optimization algorithm) and LSTM (time series forecasting algorithm) to find an optimal lookback to reduce the computation time, find an optimal lookback, and improve the forecasting performance.
• The proposed integration is also valid for other deep learning algorithms like BILSTM, GRU, and machine learning-based regression algorithms with minor modifications. It is also one of the future directions that we considered for optimization.

Materials and methods
This section explains the preprocessing performed on the dataset, the difference between three LSTM models, and the integration of simulated annealing with LSTM for wind power prediction.

Dataset division
The dataset we used for the analysis is available on this link. There are about 16 features and one power variable to predict. The training set is 50%, the test set is 25%, and the validation is 25%. The next step is to transform the dataset based on the lookback parameter, which shows how many previous time step information is to be included to make the current prediction. The transformation plays a key role in the whole process, and it can drastically increase the size of the dataset and computation time. To understand the dataset transformation step refer to Figs 1-3. In general, we do not know how many previous time step information should be included for the current time prediction, so an efficient approach like simulated annealing is required to find that optimal lookback. Consider The transformation step for any number of lookbacks is explained in the code. What follows describes the simulated annealing and LSTM working.

Simulated annealing
Simulated Annealing (SA) is an optimization technique for locating global optima. Simulated annealing employs the objective function of an optimization problem, which in our case is the MSE. The root mean square or explained variance can also be used as an optimization function, but we used the mean square error.
The method works similarly to a hill-climbing algorithm; instead of only choosing the optimal step, it chooses a random move. It constantly changes the current solution if the chosen move improves the solution. Otherwise, the procedure will proceed with a probability of fewer than one. With the "badness" of the maneuver, the chance of changing the current solution falls exponentially.  This likelihood of changing the current solution is also determined by the parameter T (Temperature). Uphill movements are more common with higher T values. Fig 4 shows the simulated annealing algorithm. Table 1 shows the simulated annealing parameters. This work extends our previously proposed algorithm, which used a genetic algorithm with LSTM for time step optimization. In that algorithm, we optimized the lookback and the number of neurons in each layer, which took a lot of time. The focus of that work was on improving the performance but not reducing the computation time, so we decided to use simulated annealing to improve the performance (lookback) and reduce the computation time. The success of a genetic algorithm depends on the number of generations and the number of instances in each generation. LSTM has to be executed 100 times for ten generations and ten instances, whereas the simulated annealing requires only a small number of iterations to find an optimal lookback.

LSTM
LSTM is a powerful timeseries prediction algorithm used for genetics [23], windpower prediction [24,25], text processing [26], and human action prediction. The LSTM comprises three sections, as illustrated in the diagram Fig 5, each of which serves a different function. The first component determines whether the last timestamp's information should be remembered or is irrelevant and can be ignored. The cell attempts to learn new information from the input in the second section. Finally, the cell sends updated information from the current timestamp to the next timestamp in the third component. The gates are the three components of an LSTM cell. The Forget gate is the first component, the Input gate is the second, and the Output gate is the third.    Table 2 shows the LSTM architecture for the deep learning model.

Simulated annealing objective function
This section discusses and explains the objective function of simulated annealing. The objective of simulated annealing is to reduce the validation mean square error (See Eq 4), which is calculated using y_validation_data and predicted_values = (LSTM.predict(X_vali-dation_data)).
In Eq 4, n represents the number of samples in the validation set, x represents the predicted samples, y represents the actual samples, and i represents the i th instance.
The following text unpacked the objective function, which minimizes the mean square error MSE.

The computation cost of the proposed algorithm
The following calculation shows the time for simple LSTM when lookback is increased from 1 to N.

Time for lookback ¼ N ¼ OðNÞ
Above mentioned equations show the training time for LSTM for lookback 1 to N, and the total time is shown in Eq 5.
The following calculation shows the time when simulated annealing is used to find an optimal lookback. Consider the worst-case scenario in which N is selected for 20 iterations.

Time for 20 iterations with lookback
Compare Eq 5 with Eq 6. The computation time of Eq 6 (when simulated annealing is used with LSTM for prediction) is far less when lookback is increased from 1 to N. If the number of iterations = N, the cost for both approaches becomes the same.

Stress analysis on lookback
When simulated annealing is used to find the optimal lookback, we specify a particular range in which simulated annealing should look for the next lookback. Increasing lookback directly affects the dataset's size because this increase results in data replication. Consider the following calculation to understand the size of the dataset in memory as the lookback is increased linearly.
Eq 7 shows the relationship between the size of the dataset and lookback.
Total size in memory ¼ ðTotal Rows À lookbackÞ � ð1 þ lookbackÞ ð7Þ • For lookback = 9; total size = (10 − 9)(1 + 9) = 10 If we increase the lookback, the size of the dataset increase, so we cannot consider all the lookbacks when using simulated annealing. There must be a specific bound on lookback; otherwise, the system memory would not be able to handle it.

Results
This section compares each method's computation time and performance: Naïve LSTM (A simple LSTM model with lookback = 1 and epochs = 200), Simple LSTM (A simple LSTM model with lookback = 1-500 and epochs = 30), and LSTM with Simulated Annealing (LSTM model with 20 iterations of simulated annealing, 30 epochs, and lookback = 1-500). We observed that LSTM's performance for windpower prediction could be increased by mutating two parameters: the first one is the number of epochs (the number of times that the learning algorithm will work through the entire training), and the second is the lookback (Number of previous time steps to predict the current time step or the window size in terms of LSTM). For Naïve LSTM we considered 500 epochs and loopback = 1 it took 134 seconds yield MSE of about 0.25 for windfarm 1. It was executed quickly, but the difference between the predicted values and actual values was very high, as shown in Table 3.
For Simple LSTM we considered epochs = 30 and lookback = [1-500] to find an optimal solution. We considered lookback from 1 to 500 because we do not know where the optimal lookback exists, and to find an optimal lookback iteration, an overall lookback is required. This brute force technique works, but the computation time increases exponentially. The time for epochs = 200 and lookback = 1-500 is 166 hours as shown in Fig 6, and it is extremely computationally expensive. Moreover, the size of the dataset is increased as we increase the lookback, as shown in Fig 7. The difference (MSE, RMSE, r2_score, explained variance, and MAE) between the actual values and predicted values are shown in Fig 8. The last method is LSTM with Simulated Annealing in which we tried to reduce the computation time to find an optimal lookback without affecting the performance. We used simulated annealing in two steps: the first is to find the optimal lookback, and the second is to use the optimal lookback and increase the epochs to 200. For simulated annealing-based LSTM (20 iterations of simulated annealing to find an optimal lookback and 30 epochs), the computation time was 2.67 hours.
The two-tailed p-values between each metric when simulated annealing is not used are shown in Table 8.
It is essential to understand overfitting and underfitting when simulated annealing is used for optimization. The overfitting and underfitting of simulated annealing can be inferred from the results. In the case of simulated annealing, the optimal lookback can be a sub-optimal solution, but whether the optimal/sub-optimal lookback overfits/underfits on the training or not can be inferred from the test MSE.
The test data is not used to find an optimal look back, so the LSTM performance without simulated annealing and with simulated annealing can be used to see whether the model overfits or not on the training data. So, we see that MSE on test data is reduced with simulated annealing, which means the model does not overfit the training data when simulated annealing is used to find an optimal lookback. As far underfitting is concerned, the model does not underfit because when optimal lookback is used to train the model, the training MSE is reduced compared to when simulated annealing is not used.
The first interesting observation is reduced computation time. Consider the results for windfarm 1 Table 4. The simulated annealing algorithm starts with lookback = 220, increasing the iterations. It mutates the lookback, rearranges the training data accordingly, and reports the evaluation metrics. The results for lookback = 236, shown in row 8, are the best, so we used that lookback to train the model for 500 epochs. Compare the same process with the simple Second, lookback for wind farms 1, 2, and 3 are different because the lookback is optimized for a particular wind farm. Third, we considered minimizing mean absolute error as an objective function. However, minimization of other parameters like mean explained variance or mean square error can also be considered an objective function.
The computation cost of the whole process depends on the number of iterations of the simulated annealing. Consider the number of epochs = 30, lookback in the range , and 20 iterations of simulated annealing. For each iteration, the computational time will be different because it depends on the lookback, as shown in the diagram The time algorithm takes when lookback is linearly increased from 1 to 500 is the sum of all look back, which is greater than 20 � T500. After finding the best lookback, the final step is to use LSTM with the best lookback for a specific number of epochs. The number of epochs in the final iteration is greater than the number of epochs used for the simulated annealing. So that is how simulated annealing reduces the execution time without reducing the performance.

Conclusion
This section includes the concluding remarks, future directions, and the limitation of the proposed algorithm. In this article, we proposed a simulated and healing-based LSTM for wind power prediction, which reduces the time to find an optimal lookback for LSTM prediction reducing the loss function. Rather than using simulated annealing, a particle swarm optimization algorithm or a combination of genetic algorithms and simulated annealing can be used for more robust predictions.
Due to limited space, we considered only 500 lookbacks (2009-07-02-13 to 2009-07-23-07 = 21 previous days), but in reality, data for more than one month should be considered to make the current prediction, which we believe can significantly increase the prediction. It also depends on the season; for example, the wind velocity is not constant throughout a year for a particular area, so if we consider large lookback information, the model's performance may degrade.
We considered wind power prediction, and in the future, we plan to do it for other forecasting problems related to the energy sector like solar power prediction. Second, we considered only one optimization algorithm, simulated annealing, but optimization algorithms like particle swarm optimization and genetic algorithm can also be used to benchmark the performance. Benchmarking is necessary because different optimization algorithms yield different optimal lookbacks, and simulated annealing tries only a limited number of lookbacks (same as the number of iterations) to find an optimal one. Third, we considered only the lookback parameter for optimization, but simulated annealing can also be used to optimize multiple parameters like the number of epochs, which will undoubtedly increase the overall computation cost. The LSTM can be replaced with BILSTM, GRU, and machine learning-based regression algorithms with minor modifications.
Following are the computer and library specs used for implementing models and generating results. The system specifications are: Intel(R) Core(TM) 7-9750H CPU @ 2.60Hz, 16 GB  All the code and dataset used to demonstrate the metholodgy of this paper is available at this link. NaiveLSTM directory contains a simple LSTM code (window size 1 and 500 epochs) for wind power prediction for 3 datasets. SimpleLSTM directory contains a simple LSTM code (lookback [1-500] and 200 epochs) for wind power prediction for 3 datasets. SimulatedAnnealing/code-find optimal lookback.py file contains a simulated annealing LSTM code (to find optimal lookback and 30 epochs) for wind power prediction for 3 datasets. SimulatedAnnealing/code-use optimal lookback.py file contains a simulated annealing LSTM code (to find optimal lookback and 30 epochs) for wind power prediction for 3 datasets.

Author Contributions
Conceptualization: Muhammad Muneeb.  This shows the two-tailed p-value between both results (without optimization algorithm and with simulated annealing) for all metrics and each wind farm. Values from Tables 3 and 7 were used to find the two-tailed p-values. https://doi.org/10.1371/journal.pone.0275649.t008